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1.  Foreword 

Following  is  the  Center  for  Imaging  Science  (CIS)  final  report  for  1999-2001  on  the  development  of  the  fundamental 
underpinnings  for  the  representation  and  understanding  of  complex  scenes.  CIS  is  composed  of  researchers  from  MIT, 

Ohio  State,  Smith-Kettlewell  Eye  Research  Institute,  University  of  Illinois,  University  of  Texas  at  Austin,  University  of 

Texas  at  El  Paso,  Washington  University  and  Yale.  Reflecting  the  broad  nature  of  imaging  science,  the  research  in  at  CIS 
is  multidisciplinary  encompassing  physics,  mathematics,  electrical  engineering,  computer  vision,  computer  science  and 
cognitive  science. 

The  efforts  of  CIS  during  the  1999-2001  years  built  on  the  mathematical  foundations  that  have  been  emerging  over  the 
past  several  decades.  Examining  a  broad  class  of  remote  sensing  problems,  we  have  been  establishing  the  fundamental 
framework  for  the  inference  and  representation  of  structures  in  complex  systems  and  scenes  of  complex  shapes 
proceeding  from  the  representation  of  complex  scenes,  to  image  formation  and  sensor  modeling,  and  culminating  in  the 
development  of  the  fundamental  underpinnings  for  optimal  decision/recognition  strategies  in  image  understanding  and 

ATR.  Within  this  framework,  we  are  establishing  the  methodology  for  establishing  the  limits  of  performance  of  detection, 
identification  and  recognition  algorithms  solving  remote  sensing  problems  involving  data  from  multiple  active  and  passive 
sensors. 
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4.  Statement  of  the  problem  studied 

These  have  been  extremeiy  productive  years  including  numerous  publications,  talks,  visits  to  Army  Laboratories, 
database  developments,  and  distribution  to  Academic  and  government  researchers,  and  community  service. 

Contributions  are  in  the  5  major  thrust  areas  that  CIS  is  pursuing: 

1 )  Fundamental  Bounds  and  Metrics  for  Detection/Tracking/  Identification; 

2)  The  Infinity  of  Variation  in  FLIR/Optical  and  Radar  Signatures: 

3)  Multiple  Sensor  Fusion  and  Information  Theory; 

4)  Information  Theory  Based  Complexity  of  Representation  and  Compression,  and 

5)  Databases  and  Clutter:  Collection  and  Characterization.  Results  include:  derived  bounds  for  aimpoint  across 
scale;  derived  bounds  for  aimpoint  in  clutter;  derived  bounds  for  aimpoint  using  LADAR/FLIR;  and  derived  bit  rate 
information  bounds  for  VIDEO  sensor. 

The  Web  site  continues  to  receive  5000-7000  visitors  a  month  and  has  distributed  databases  to  an  average  of  500  unique 
visitors  each  month.  CIS  members  have  submitted  for  publication  6  papers,  published  34  papers,  gave  32  talks,  and  have 
been  involved  in  numerous  meetings  with  the  Army  and  DOD.  CIS  has  a  publications  index  of  253  papers  in  the  areas  of 
target  recognition,  deformable  geometry,  image  formation  and  sensor  modeling,  tomographic  imaging,  computational 
vision,  inference  and  optimization,  adaptation  and  learning,  multiple  sensor  fusion,  clutter  modeling,  and  performance 
bounds. 

5.  Summary  of  the  most  important  results 
The  major  results  in  the  Center: 

1 .  Performance  bounds  and  metrics:  We  have  established  Performance  Bounds  and  Metrics  for  pose  estimation 
and  identification.  We  have  established  for  pose  the  MMSE  estimator,  and  established  curves  of  performances 
for  multiple  sensors,  FLIR,  LADAR,  HRR,  VIDEO. 

We  have  established  the  tight  connection  between  estimator  bounds  and  ID  bounds.  Exponential  ID  bounds  are 
determined  by  estimator  accuracy  bounds. 

2.  Sensor  fusion:  We  have  established  the  optimum  methodology  for  combining  information  optimally  from  multiple 
sensors.  Estimation  and  ID  bounds  have  been  calculated  for  combinations  of  HRR,  LADAR,  and  FLIR. 

3.  FLIR  signature  variation:  We  have  established  a  methodology  for  accommodating  the  infinity  of  signature 
variation  in  FLIR  associated  with  temperature  variability  due  to  whether  and  tank  operating  conditions.  We  have 
demonstrated  that  for  Comanche  FLIR  the  uncertainty  of  signature  costs  approximately  1/2  bit  of  accuracy  in 
estimation. 

4.  Clutter:  We  have  established  the  fundamental  role  of  higher  order  statistics  such  as  kurtosis  plays  in  the 
degradation  in  performance  of  ATR  systems. 

5.  Databases:  We  have  established  a  Nation  repository  for  DoD  databases  from  which  investigators  around  the 
world  can  access  military  data  sets,  (see  http://cis.jhu.edu) 

Given  in  the  appendix  is  a  detailed  description  of  the  work  and  results  in  the  Center  for  Imaging  Science. 
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Summary  of  the  Most  Important  Results. 

Region/Edee-based  Target  Segmentation  of  FLIR  Images  Modeled  bv  Weihull/Gaiissian 
Distributions.  An  initial  detection  algorithm  employing  Gaussian  and  Weibull  functions  to  model  the 
background  is  used  to  robustly  identify  all  possible  regions  in  the  image  that  are  candidate  locations  of 
targets  [1].  A  two-stage  focused  analysis  of  each  candidate  target  location  is  then  performed  to  get  an 
accurate  representation  of  the  target  boundary.  A  region-growing  procedure  that  uses  a  diffusion  process 
driven  by  the  imderlying  probability  distribution  of  the  background  and  modulated  by  local  shape  changes 
of  the  target  is  used  to  estimate  the  target  shape.  The  boundary  of  the  target  is  then  combined  with  salient 
edge  information  in  the  image  to  arrive  at  a  more  accurate  representation  of  the  target  boundary.  A 
computationally  efficient  and  flexible  method  to  incorporate  the  salient  edge  information  into  the  region 
boundary  has  been  developed  by  formulating  it  as  a  Bayesian  classification  problem.  Finally,  to  reduce 
the  false  alarm  rate,  a  higher-level  interpretation  module  is  used  to  classify  the  detected  areas  as  man¬ 
made  or  natural  objects  using  geometric  and  FLIR  intensity-based  features  extracted  from  the  target. 

Bayesian  recognition  of  targets  bv  parts  in  2"**  generation  FLIR  images.  We  have  developed  a 
hierarchical  recognition  strategy  for  classification  and  recognition  of  2D  targets  in  2“*  generation  FLIR 
images  that  uses  salient  object  parts  as  cues  for  recognition  [2].  At  the  lowest  level,  classifiers  are  trained 
to  recognize  the  class  of  an  input  object,  while  at  the  next,  classifiers  are  trained  to  recognize  specific 
objects.  At  each  level,  objects  are  recognized  by  their  parts.  Each  classifier  is  made  up  of  modules,  each 
of  which  is  an  expert  on  a  specific  part  of  the  object.  Each  modular  expert  is  trained  to  recognize  one  part 
under  different  viewing  angles  and  transformations.  A  Bayesian  realization  has  been  developed  in  which 
the  expert  modules  present  the  probability  density  flmctions  of  each  part,  modeled  by  a  mixture  of 
densities  to  incorporate  different  views  (aspects)  of  each  part.  Recognition  relies  upon  the  sequential 
presentation  of  the  parts  to  the  system  without  the  use  of  any  information  on  relationships  between  the 
parts.  Part  modules  are  given  selective  importance  in  the  recognition  process  based  upon  how 
discriminating  they  are  in  the  recognition  process.  Recognition  results  are  obtained  using  a  recursive 
Bayesian  updating  rule.  The  advantage  of  such  a  system  is  its  ability  to  sequentially  examine  different 
parts  of  an  object  and  modify  the  recognition  probability  as  more  parts  are  seen.  Since  each  part  is 
represented  by  an  expert  module,  recognition  is  faster  than  when  using  a  matching  algorithm,  and  pose 
estimation  is  simplified.  We  have  also  developed  a  new  method  to  decompose  a  target  into  its  parts 
using  its  outline  or  boundary.  This  is  based  on  the  premise  that  different  parts  of  an  object  show  up  as 
distinct  surfaces  in  the  image.  By  finding  these  surfaces,  the  parts  of  the  target  can  be  determined.  To 
obtain  these  surfaces,  cues  such  as  edges,  comer  parts,  and  T-junctions,  that  suggest  the  existence  of  a 
surface  are  determined  from  the  target’s  outline.  A  linear  diffosion  approach  is  used  to  determine  the 
surface  segments  from  the  sets  of  cues.  Segments  are  then  grouped  into  parts.  In  the  case  where  the 
underlying  distribution  of  parts  is  not  readily  obtainable,  neural  network  techniques  may  provide  a 
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suitable  alternative.  The  methodology  for  recognition  by  parts  could  easily  be  extended  to  3D  objects.  If 
3D  parts  were  represented  using  primitives  such  as  cylinders,  hyperellipsoids,  etc.,  the  dimension  of  the 
shape  feature  will  be  much  lower  than  is  required  for  2D  objects. 

Experimental  results  on  1,930  FLIR  images  showed  that  our  automatic  target  detection/  recognition 
system  can  achieve  recognition  with  a  high  degree  of  accuracy  and  a  low  false  alarm  rate  [3].  Outdoor 
field  FLIR  images  were  used  as  input,  in  which  the  military  targets  were  at  a  distance  of  2-  3.5  kilometers 
from  the  sensor  and  thus  occupied  <5%  of  the  image  pixels.  A  total  of  1,930  images  from  28  datasets 
was  used,  ranging  in  quality  from  poor  to  good  to  excellent.  All  images  were  obtained  under  various 
ambient  scene  and  weather  conditions.  89%  of  the  targets  were  correctly  located  in  the  detection  stage, 
with  a  false  alarm  of  <5%.  90%  of  the  detected  regions  could  be  correctly  segmented.  For  most  datasets, 
70%  of  the  target  types  were  correctly  reported,  with  an  80%  rate  of  pose  recognition. 

Detecting  Moving  Objects  in  Airborne  FLIR  Sequences.  We  have  developed  a  methodology  to  detect 
independently  moving  objects  in  FLIR  image  sequences  taken  from  an  airborne,  moving  platform  [4-6]. 
Ego-motion  effects  are  removed  through  a  robust  multi-scale  affine  image  registration  process. 
Consequently,  areas  with  residual  motion  indicate  object  activity.  These  areas  are  detected,  refined  and 
selected  using  a  Bayesian  classifier.  The  remaining  regions  are  clustered  into  pairs.  Each  pair  represents 
an  objects  front  end  or  rear  end.  Using  motion  and  scene  knowledge  we  estimate  object  pose  and 
establish  a  region-of-interest  for  each  pair.  Edge  elements  within  each  region  of  interest  are  used  to 
segment  the  convex  cover  containing  the  independently  moving  object.  Our  experiments  used  real, 
complex  cluttered  and  noisy  sequences. 

This  robust  system  is  designed  for  integration  into  a  comprehensive  automatic  target  recognition  (ATR) 
and  action  classification  system.  This  dynamic  scene  analysis  system  could  be  integrated  into  existing 
static  image  ATR  systems.  It  could  be  used  in  a  Bayesian  sensor  fusion  paradigm  to  improve  detection 
accuracy  and  reduce  false  alarms.  In  such  a  fusion  stage,  detection,  recognition  and  pose  results  fi-om 
cues  (such  as  motion,  target  shape,  size  or  parts)  could  be  integrated  using  a  Bayesian  meta-classifier. 

The  different  paradigms  could  be  used  to  mutually  verify  results  and  synergetically  improve  performance. 
Compared  to  existing  systems,  dynamic  scene  analysis  enables  the  inclusion  of  target  action  recognition. 
This  action  recognition  could  enable  the  automatic  extraction  of  multi-frame  analysis  results,  such  as 
object  starts  and  stops  or  changes  in  acceleration  or  direction. 

3D  Reconstruction  of  an  Urban  Scene  from  Synthetic  Fish-eve  Images.  Fish-eye  stereo  analysis  has 
many  promising  applications  in  multisensor  computer  vision.  We  have  explored  the  feasibility  of 
generating  a  3D  model  of  an  urban  scene  fi-om  a  pair  of  stereo  images  taken  by  full-circular  fisheye  lenses 
from  different  views  [7].  Similar  to  the  stereo  analysis  of  pinhole  camera  images,  methods  for 
establishing  correspondence  and  triangulation  recovery  must  be  adapted  to  the  nonlinear  fish-eye 
transformation  model.  Our  reconstruction  algorithm  is  based  on  an  error-free  equi-distance  projection 
model.  Our  algorithm  was  tested  on  synthetic  data. 

Structure  in  Content-based  Image  Retrieval..  We  have  studied  the  use  of  structure  in  content-based 
image  retrieval  (CBIR)  with  those  based  on  histogram  and  texture  analysis  methods  in  the  context  of 
locating  images  containing  manmade  objects.  The  advantage  of  using  structure  in  such  queries  was 
demonstrated  by  analyzing  an  image  database  of  monocular  grayscale  outdoor  images  to  retrieve  images 
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containing  buildings  [8].  A  methodology  based  on  the  principles  of  perceptual  grouping  in  a  Bayesian 
framework  has  been  developed  [9].  Higher-level  and  lower-level  vision  methodologies  have  been 
combined  for  enhanced  performance  [10].  A  lower-level  analysis  module  is  used  to  increase  the 
capability  of  the  higher-level  module.  Higher-level  analysis  is  performed  globally  to  extract  structure 
using  the  principles  of  perceptual  grouping  to  extract  different  shape  representations  for  higher-level 
feature  extraction  from  primitive  image  features.  Lower-level  analysis  is  performed  globally  using  Gabor 
filters  to  extract  texture  features.  A  manmade  object  region  of  interest  is  used  as  a  frame  for  conducting 
lower-level  analysis,  although  such  analysis  is  not  confined  to  the  region  of  interest.  A  channel  energy 
model  is  utilized  to  extract  lower-level  feature  vectors  consisting  of  fractional  energies  in  various  spatial 
channels.  The  results  obtained  by  the  higher-level  analysis  level  using  a  Bayesian  classifier  were  refined 
and  enhanced  by  the  results  obtained  by  the  lower-level  analysis  module  using  a  nearest  neighbor 
classifier.  Experimental  results  document  the  enhanced  recall  and  precision  rates  obtained  using  this 
combined  approach. 

It  has  been  noted  that  many  of  the  perceptually  salient  image  properties,  such  as  collinearity,  parallelism, 
and  line  continuation,  are  viewpoint  invariant.  Certain  scene  structures  will  always  produce  images  with 
discemable  features,  regardless  of  viewpoint,  while  other  scene  structures  virtually  never  produce  these 
properties.  This  correlation  between  salience  and  invariance  has  suggested  that  the  perceptual  salience  of 
viewpoint  invariance  is  due  to  the  leverage  it  provides  for  inferring  geometric  properties  of  objects  and 
scenes.  The  perceptual  inference  and  grouping  process  and  color  histogram  are  isotropic  mappings  that 
are  invariant  to  the  isometries  of  geometrical  objects.  Isotropic  mappings,  acting  on  isometries  of 
perceptually  salient  structures,  are  useful  in  image  retrieval  as  they  illustrate  the  similarity  of  structures 
present  in  different  images.  Anisotropic  mappings,  such  as  texture  analysis  response  obtained  from  a 
channel  energy  model,  can  determine  image  uniqueness.  The  premise  of  our  model,  which  is  consistent 
with  the  functionings  of  the  human  visual  system,  is  to  extract  rich  descriptions  of  lower-level  anisotropic 
local  image  structure,  and  use  these  descriptions  for  subsequent  grouping  into  higher-level  isotropic 
features. 

The  latest  implementation  of  the  system  [11-13]  is  able  to  serve  queries  ranging  from  scenes  of  purely 
natural  objects  such  as  vegetation,  trees  and  sky,  to  images  containing  conspicuous  structural  objects  such 
as  buildings,  towers  and  bridges.  The  system  was  tested  on  an  image  database  consisting  of  2660  24-bit 
color  images.  When  tested  on  a  query  image  of  a  flower,  the  system  successfully  eliminated  images  that 
contained  significant  manmade  structures  even  when  they  had  similar  color  distribution  and/or  texture 
patterns,  and  retrieved  only  images  that  also  contained  flowers,  leaves  and  grass.  Likewise,  a  query 
image  of  a  building  fafade  retrieved  images  similar  in  both  lower-level  vision  content  (histogram  and 
texture  patterns)  and  higher-level  vision  content  (semantics  describing  structure).  The  results  emphasize 
the  efficacy  of  using  a  combination  of  isotropic  and  anisotropic  features. 

Visualization  and  Classification  of  Hteh-Dimensional  Data  Using  Nonlinear  Manifolds.  Thig  is  a 
fundamental  problem  in  object  recognition  and  image  analysis.  Our  seminal  work  and  breakthrough  in 
this  area  appeared  as  a  22  page  paper  in  IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intelligence 
in  January  2001  [14].  We  first  modified  the  most  well  known  non-linear  manifolds,  namely,  the  principal 
curve  and  principal  surface  [15].  The  modification  involves  orienting  and  clipping  the  covariances  at  each 
of  the  manifold  nodes  such  that  variances  in  directions  tangential  to  the  manifold  are  minimized.  The 
motivation  behind  this  modification  lies  in  the  desire  to  recover  and  approximate  the  projection  step  of  the 
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original  principal  curve  algorithm  in  current  probabilistic  principal  surface  formulations.  Experiments  on 
artificial  and  real  datasets  suggest  that  this  modification  does  indeed  lead  to  a  vast  improvement  in 
convergence  speed  and  better  generalization  properties  for  principal  surfaces.  Subsequently,  we  pioneered 
the  use  of  spherical  manifolds  for  the  simultaneous  classification  and  pose  estimation  of  3-D  objects  firom 
2-D  images  [16].  The  spherical  manifold  imposes  a  local  topological  constraint  on  samples  that  are  close 
to  each  other,  while  maintaining  a  global  structure.  Each  node  on  the  spherical  manifold  also  corresponds 
nicely  to  a  pose  on  a  viewing  sphere  with  2  degrees  of  freedom.  The  proposed  system  has  been 
successfully  applied  to  tank  and  aircraft  classification  and  pose  estimation. 

Modular  Learning  Through  Output  Space  Decomposition,  with  Applications  to  Classification  of 
Hyperspectral  Sensor  Data.  Several  recognition  problems  pertinent  to  ARO  involve  a  large  number  of 
potential  classes.  In  addition,  some  of  the  new  sensors  -  most  notably  hyperspectral  sensors  that  provide 
about  200  spectral  bands  per  “pixel”,  and  is  being  rapidly  deployed  in  several  DoD  remote  sensing  and 
surveillance  applications  -  have  the  additional  challenge  of  high-dimensional  input  space.  While  feature 
selection/extraction  techniques  are  often  used  to  simplify  the  input  space  and  alleviate  the  curse  of 
dimensionality,  modular  learning  paradigms  based  on  the  divide  and  conquer  precept  are  used  to 
decompose  the  problem  into  simpler  classification  tasks  through  input  space,  training  set  or  feature  space 
decomposition.  We  have  developed  in  detail  an  output-space  decomposition  framework  in  which  a  C>2 
class  problem  is  systematically  decomposed  into  simpler  two-(meta)class  problems  [17-21].  Apart  from 
improving  generalization  performance  for  difficult  classification  problems,  such  problem  decomposition 
in  output  space  allows  class  specific  feature  extraction,  and  yields  significant  domain  knowledge  that  is 
not  possible  to  obtain  from  conventional  single  classifiers  or  modular  learning  paradigms. 

Two  frameworks  for  problem  decomposition  in  output  space  were  developed.  In  the  first  framework, 
called  the  Pairwise  Classifier  (PC)  framework,  a  C-class  problem  is  exhaustively  decomposed  into  a  set 
of  C  choose  2  two-class  problems  [18].  Features  that  best  discriminate  the  two  classes  are  extracted  for 
each  pairwise  classifier  and  the  outputs  of  all  these  classifiers  are  combined  to  yield  the  final  output  in  the 
original  output  space.  The  PC  framework  is  applied  to  the  problem  of  landcover  prediction  involving  high 
dimensional  hyperspectral  sensor  data  that  can  be  treated  as  a  signal  with  high  correlation  among  adjacent 
spectral  bands.  Top-down  and  bottom-up  multiresolution  feature  extraction  algorithms  for  such 
specialized  sensors  are  also  developed  for  two-class  problems.  Used  in  conjunction  with  the  PC 
framework,  these  feature  extraction  algorithms  yielded  significant  improvements  in  classification 
accuracy  and  discovery  of  useful  domain  knowledge  consistent  with  experts'  opinion,  for  a  variety  of 
datasets  [21, 17]. 

The  second  framework  for  problem  decomposition  in  output  space  [20],  called  the  Binary  Hierarchical 
Classifier  (BHC)  framework,  involves  the  decomposition  of  a  C-class  problem  into  a  binary  tree  with  C 
leaf  nodes  and  C-1  internal  nodes.  Each  internal  node  is  comprised  of  a  feature  extractor  and  a  classifier 
that  discriminates  between  the  two  meta-classes  represented  by  its  two  children.  Both  bottom-up  (BU- 
BHC)  and  top-down  (TD-BHC)  approaches  for  automatically  building  such  a  BHC  are  developed.  The 
BU-BHC  is  built  by  applying  agglomerative  clustering  to  the  set  of  C  classes  while  the  TD-BHC  is  built 
by  recursively  partitioning  a  set  of  classes  at  any  internal  node  into  two  disjoint  groups  or  meta-classes. 
The  coupled  problems  of  finding  a  good  partition  and  of  searching  for  a  linear  feature  extractor  that  best 
discriminates  the  two  resulting  meta-classes  are  solved  simultaneously  at  each  stage  of  the  recursive 
algorithm.  The  BHC  framework  not  only  reduces  the  number  of  two-class  classifiers  from  C-choose-2  in 
the  PC  framework  to  only  C-1,  but  it  also  discovers  domain  knowledge  with  regard  to  the  class  taxonomy 
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and  the  features  that  discriminate  classes  at  each  internal  node.  For  difficult  high  dimensional 
classification  problems,  a  significant  improvement  in  classification  accuracy  over  conventional  classifiers 
is  also  obtained  by  the  BHC  classifiers  [20].  A  full  paper  detailing  this  approach  and  its  successful 
application  to  a  wide  variety  of  pattern  recognition  problems  is  being  prepared  for  PAMI. 

Visualization  of  RBF  Networks.  RBF  networks  are  one  of  the  most  powerful  and  popular  neural 
network  models,  and  have  numerous  applications  of  interest  to  ARO.  A  major  problem  with  neural 
network  models  is  the  lack  of  interpretability.  So  we  have  devised  a  powerfol  method  for  the  3D 
visualization  of  the  structure  of  radial  basis  function  networks  [22].  This  method  allows  the  visualization 
of  basis  function  characteristics  (centers  and  widths)  along  with  second  level  weights.  Network  properties 
can  be  displayed  simultaneously  with  the  training  data  or  test  data  in  the  same  input  space.  Principal 
component  analysis  is  used  to  transform  the  input  data  so  that  its  most  salient  dimensions  can  be 
visualized.  This  method  also  allows  changes  made  while  graphically  editing  the  network  structure,  in 
transformed  space,  to  be  projected  back  into  the  original  input  space. 

Design  and  Control  of  Large  Collections  of  Learning  Agents.  This  research  focuses  on  the  problem  of 
designing  groups  of  autonomous  agents  that  individually  learn  sequences  of  actions  such  that  the  resultant 
sequence  of  actions  achieves  a  predetermined  global  objective.  We  are  particularly  interested  in  instances 
of  this  problem  where  centralized  control  is  either  impossible  or  impractical. 

For  single  agent  systems  in  similar  domains,  machine  learning  methods  (e.g.,  reinforcement  learners) 
have  been  successfully  used.  However,  applying  such  solutions  directly  to  multi-agent  systems  often 
proves  problematic,  as  agents  may  work  at  cross-purposes,  or  have  difficulty  in  evaluating  their 
contribution  to  achievement  of  the  global  objective,  or  both.  Accordingly,  the  crucial  design  step  in  multi¬ 
agent  systems  centers  on  determining  the  private  objectives  of  each  agent  so  that  as  the  agents  strive  for 
those  objectives,  the  system  reaches  a  good  global  solution,  hi  this  work  we  consider  a  version  of  this 
problem  involving  multiple  autonomous  rovers,  where  the  global  objective  is  to  maximiT:^  the  aggregate 
information  collected  by  that  set  of  rovers.  We  employ  concepts  from  collective  intelligence  to  design  the 
goals  for  each  rover.  In  this  work  we  focus  on  the  problem  of  designing  groups  of  autonomous  agents  that 
individually  learn  sequences  of  actions  such  that  the  resultant  sequence  of  joint  actions  achieves  a 
predetermined  global  objective.  We  tackled  the  problem  of  controlling  multiple  planetary  exploration 
vehicles  (e.g..  Rovers  on  the  surface  of  Mars)  such  that  their  collective  behavior  maximizes  the  total 
information  collected.  In  this  domain,  we  addressed  the  critical  issue  of  what  utility  functions  those  agents 
should  strive  to  maximize  using  COIN  theory.  Previous  applications  of  COIN  theory  focused  on 
maximizing  rewards  (i.e.,  single  time  step  utility  values)  rather  than  time-extended  utilities.  In  this  work 
we  extend  these  results  to  a  problem  where  agents  need  to  take  sequences  of  actions,  and  use  the  Q- 
leaming  with  utilities  derived  from  COIN  theory.  Our  results  demonstrate  that  RL  rovers  using  COIN- 
derived  goals  outperform  both  “natural”  extensions  of  single  agent  algorithms  and  global  reinforcement 
learning  solutions  based  on  “team  games.”  Currently  we  are  considering  macro-learning  techniques 
which  involves  learning  other  than  each  agent’s  reinforcement  optimization  of  its  private  utility.  Also  we 
are  studying  learning  under  a  variety  of  communication  restrictions  e.g.  being  able  to  observe  only  a 
subset  of  the  other  agents.  Breakthroughs  in  both  areas  will  make  this  approach  applicable  to  a  wide 
range  of  DoD  problems  involving  mutiple  agent  systems. 

Global  Optimal  Surface  from  Stereo.  We  have  developed  an  effective  global  optimized  stereo 
matching  approach  that  produces  a  dense  displacement  map  and  an  occlusion  map  [23].  The  global 
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matching  cost  and  various  constraints,  including  matching  uniqueness  and  ordering,  and  local  smoothness 
along  and  across  epipolar  lines,  are  all  cast  into  a  novel  configuration  of  a  maximum  flow  graph.  The 
correspondence  between  the  associated  minimum  cut  and  the  defined  stereo  problem  guarantees  a  global 
optimized  disparity  solution  confined  by  those  geometric  constraints,  while  still  preserving 
discontinuities.  Different  similarity  measures  can  be  applied  to  this  framework  to  deliver  more  reliable 
matching  results.  The  capacities  of  the  edges,  which  model  the  smoothness  and  occlusion,  depend  on  the 
quality  of  the  input  images  and  the  particular  structure  of  the  surface.  An  intuitive  way  is  to  use  the  edge 
and  junction  maps  as  a  cue  to  adapt  these  capacities  to  the  area  discontinuity,  since  occlusion  and 
discontinuities  are  more  likely  to  occur  in  the  presence  of  edges  and  junctions.  More  sophisticated  area 
analysis  or  segmentation  will  help  the  computation  of  the  smoothness  arc  capacity  and  improve  the  depth 
estimation.  A  multiple  resolution  approach  can  be  directly  embedded  into  the  graph  to  reduce  the 
computational  complexity. 

Blind  Image  Deconvolution.  In  this  project,  we  addressed  the  problem  of  blind  image  deconvolution, 
where  neither  the  image  received  nor  the  degrading  systems  (assumed  LTI)  are  known  [24-27].  This 
problem  is  of  high  interest  for  applications  such  as  imaging  objects  from  the  air,  imaging  objects  in  the  air 
from  the  ground,  and  other  applications  where  there  is  little  known  about  the  degrading  channel  or 
system.  We  assume  that  multiple  blurred  versions  of  the  image  are  available.  The  blurs  are  assumed  to  be 
different,  e.g.,  taken  at  different  times  through  a  turbulent  medium.  This  problem  is  called  multichatmel 
blind  deconvolution.  We  solved  a  nullspace-based  multichatmel  blind  image  restoration  problem  using 
matrix  operations.  We  posed  the  problem  as  a  constrained  optimization  problem.  By  using  different 
constraints,  different  optimization  problems  were  formulated.  One  of  these  can  be  solved  by  matrix 
operations  alone. 

The  formulation  of  the  different  optimization  problems  implies  a  new  column-space-based  algorithm.  The 
restored  images  by  this  new  algorithm  and  a  nullspace-based  one  are  the  same.  This  new  algorithm  has 
the  same  advantages  as  the  nullspace-based  one,  such  as  exact  restoration  and  no  noise  amplification. 
Furthermore,  the  new  algorithm  requires  much  less  computational  complexity  than  the  nullspace-based 
one.  Actually,  under  some  mild  conditions,  the  complexity  of  this  new  algorithm  is  equal  to  FFT 
complexity. 

Another  eigenstructure-based  direct  multichannel  blind  image  restoration  algorithm  is  direct  deconvolver 
estimation.  We  also  formulated  it  as  an  optimization  problem.  We  made  a  coimection  between  it  and  the 
new  algorithm.  By  using  a  different  constraint  and  putting  some  weighting  on  the  objective  function  of 
the  optimization  problem,  the  direct  deconvolver  estimation  approach  is  shown  equivalent  to  the  new 
algorithm. 

We  thoroughly  studied  eigenstructure-based  techniques  for  direct  multichannel  blind  image  restoration. 
The  LTI  FIR  model  was  used  and  the  size  of  the  blur  channels  was  assumed  in  these  techniques.  These 
limitations  should  be  removed  in  the  future.  Further,  we  should  move  to  solve  nonlinear  and/or  non-time- 
invariant  problems. 
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